Leblanc and Mellor - Crummey : Debugging Parallel Programs with Instant Replay
نویسندگان
چکیده
The debugging cycle is the most common methodology for finding and correcting errors in sequential programs. Cyclic debugging is effective because sequential programs are usually deterministic. Debugging parallel programs is considerably more difficult because successive executions of the same program often do not produce the same results. In this paper we present a general solution for reproducing the execution behavior of parallel programs, termed Instant Replay. During program execution we save the relative order of significant events as they occur, not the data associated with such events. As a result, our approach requires less time and space to save the information needed for program replay than other methods. Our technique is not dependent on any particular form of interprocess communication. It provides for replay of an entire program, rather than individual processes in isolation. No centralized bottlenecks are introduced and there is no need for synchronized clocks or a globally consistent logical time. We describe a prototype implementation of Instant Replay on the BBN Butterfly' Parallel Processor, and discuss how it can be incorporated into the debugging cycle for parallel programs.
منابع مشابه
Compiler Support for Analysis and Tuning Data Parallel Programs Compiler Support for Analysis and Tuning Data Parallel Programs
Data parallel languages such as High-Performance Fortran (HPF) and Fortran D simplify the task of parallel programming by enabling users to express parallel algorithms at a high level. Compilers for these languages are responsible for realizing parallelism and inserting all interprocessor communication. For this reason, these compilers have detailed knowledge of the the relationship between its...
متن کاملAn Efficient Logical Clock for Replaying Message-Passing Programs
Cyclic debugging is one of the most important and most commonly used activities in programs development. During cyclic debugging, the program is repeatedly re-executed to track down errors when a failure has been observed. The cyclic debugging approach often fails for parallel programs because parallel programs reveal nondeterministic characteristics due to message race conditions. Execution re...
متن کاملThe ParaScope Parallel Programming Environment
The ParaScope parallel programming environment, developed to support scienti c programming of sharedmemory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugger. The compilation system extends the traditional s...
متن کاملIDLI: An Interactive Message Debugger for Parallel Programs Using LAM-MPI
Many complex and computation intensive problems can be solved efficiently using parallel programs on a network of processors. One of the most widely used software platforms for such cluster computing is LAMMPI. To aid development of robust parallel programs using LAM-MPI we need efficient debugging tools. However, the challenges in debugging parallel programs are unique and different from those...
متن کاملExecution replay of parallel programs
Debugging MIMD programs is often a delicate job. As a matter of fact, they can have diierent behaviors in successive executions. So, cyclic debugging is not applicable. To make it available for parallel programmers , we propose execution replay (full and partial) for our multi-threaded execution model, the Communicating Active Components (CAC). CAC/s have been deened to implement Parallel Objec...
متن کامل